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Abstract 



We describe a formal design for a logical query language using ^-terms as data structures to 
interact effectively and efficiently with a relational database. The structure of ^-terms provides 
an adequate representation for so-called complex objects. They generalize conventional terms 
used in logic programming: they are typed attributed structures, ordered thanks to a subtype 
ordering. Unification of ^-terms is an effective means for integrating multiple inheritance and 
partial information into a deduction process. We define a compact database representation for 
■^-terms, representing part of the subtyping relation in the database as well. We describe a 
retrieval algorithm based on an abstract interpretation of the ^-term unification process and 
prove its formal correctness. This algorithm is efficient in that it incrementally retrieves only 
additional facts that are actually needed by a query, and never retrieves the same fact twice. 



Resume 



Nous decrivons la conception formelle d'un langage de requetes logiques utilisant les tp- 
termes comme structure de donnees pour interagir effectivement and efficacement avec une 
base de donnees relationnelle. La structure des ^-termes foumit une representation adequate 
pour les objets soi-disant complexes. lis generalisent les termes conventionnels utilises en 
programmation logique: ce sont des structures typees et attribuees, ordonnees grace a un ordre 
de sous-types. L'unification des ^-termes est un moyen effectif d'integrer heritage multiple 
et information partielle dans un processus de deduction. Nous definissons une representation 
compacte en base de donnees pour les ^-termes, representant aussi une partie de 1' ordre sur 
les types dans la base de donnees. Nous decrivons un algorithme d'extraction de donnees base 
sur r interpretation abstraite de l'unification des ^-termes et prouvons sa correction formelle. 
Cet algorithme est efficace en ce sens qu'il extraie de faqon incrementale seuls les faits 
supplementaires qui sont necessaires a une requete, et jamais deux fois le meme fait. 
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The difficulty lay in the form and economy of it, so to 
dispose such a multitude of materials as not to make 
a confused heap of incoherent parts but one consistent 
whole. 

EpHRAiM Chambers, Cyclopaedia 



1 Introduction 

1.1 Motivation and contribution 

The combination of logic programming languages and database systems has been a research 
theme for the last decade in both logic programming and database communities. The interest 
from a logic programming perspective came when the need was felt for manipulating large 
sets of facts. Usually Prolog was coupled with a relational database. In [9], Ceri et al. provide 
an excellent overview of work in this area. In the database community, it was felt that the logic 
programming paradigm offers interesting opportunities as a database query language. This 
resulted in logical query languages like CVC [14] and NAIL! [13]. 

So-called complex objects have recently been studied for use in database systems [7, 8]. Much 
of what has been proposed in those studies is derived from earlier work extending tirst-order 
terms to ^-terms [1]. The latter notion has had a more direct application in programming 
language design [4, 2, 6] than in database systems. Still, the functionality and naturalness 
of deductive queries over ■^-terms is a strong motivation for providing a logic programming 
language using ^-terms with an effective means to access large volumes of data and knowledge 
stored in a database (see [5] for a convincing example). 

We propose a formal design for an effective coupling of such a language with a relational 
database. For the purpose of our presentation and experimentation, we use the specific 
language LIFE [2], but this implies no loss of generality. Indeed, although we formulate it 
using ■^-terms, our design is directly applicable to any logical query language with complex 
objects represented as Prolog terms or as data structures a la [7, 8], since all these models turn 
out to be special cases of ^-terms. We present the theoretical view of our proposed database 
support of that language and discuss the results. Our theoretical design was put into practice 
as the basis of an experimental implementation [12]. 

Although our experiment may be categorized as providing database support to a logic 
programming language, it goes beyond previous research in that it considers a language with 
types and attributed terms, which can be arbitrarily nested, and provide multiple inheritance. 
As will be shown, due to the specific characteristics of LIFE's type system, our experiment 
has yielded a form of database support that not only allows querying for facts, but also 
posing abstract queries, that is, queries that ask for general knowledge as opposed to factual 
knowledge. 
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1 .2 Organization of paper 

Before we delve into technicalities, here is a brief introductory overview of the paper. Our 
system is organized as sketched in Figure 1 and consists of three subsystems; namely, the 



LIFE system 


query 


relational 








inter 




database 




face 


data 





Figure 1 . Architecture of the system. 



LIFE system, an interface written in LIFE, and an external relational database. The coupled 
system is intended to represent the facts of LIFE in the database and to retrieve these facts, 
when needed by the LIFE system. 

Hence, the functionality of the interface is twofold. Firstly, it provides a compact database 
representation for logical facts. As we shall see in Section 2, these facts are ordered by a 
subsumption relation induced by a subtype ordering on functors. In Section 3, we propose to 
group facts into what we call qualified segments, such that the subtype relationships involving 
symbols in these facts are implicitly represented. We also compress segments before storage 
in the database. 

Secondly, for the retrieval of facts, we use a tight coupling [15, 16], where facts are loaded 
when needed by the LIFE system. In Section 4, we describe an abstraction of the unification 
process, where qualified segments in the database are approximated by a set of generalizations, 
called qualifier. If facts from the database are requested, we use the qualifier and the current 
goal, a term, to construct a candidate: a selection condition on the segment, retrieving all facts 
that unify with this goal. In Section 5, we show that not all subtype relationships need be 
stored in the LIFE-system, since some are implicitly represented in the database. In Section 6, 
we optimize the retrieval process, by storing loaded facts in the internal database and retrieving 
each fact only once. We conclude with Section 7, with a recapitulation of our work and a brief 
overview of the perspectives it offers. No particular background is required to understand 
the technical contents of this paper other than elementary discrete algebra, shreds of logic 
programming, and basic notions of relational and deductive databases. 

2 The facts of LIFE 

LIFE (Logic, Inheritance, Functions, Equations) is a logic programming language extending 
Prolog terms as described in [2, 4, 6]. The user can specify inclusion relationships between 
functor symbols, thus enabling the direct representation and use of taxonomic information. 
Thus, functors are called types and no longer differentiated from values. For example, we can 
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state that apples is a subtype of food, so that a fact likes{mary Jood) , stating that mary Ukes 
food, implies that mary likes apples as well. 

To make use of a subtyping relation in a logic programming language, the unification operation 
must be redefined. The subtyping relation generates a partial order on the set of all terms called 
term suhsumption. Unification of two terms computes their greatest lower bound (GLB) with 
respect to term subsumption. Failure of unification is denoted by a special term: the symbol 

_L {"bottom"). 

For the purpose of our presentation, it will suffice to assume that a LIFE program P consists of 
the specification of the subtype ordering, and logical rules in the form of Horn-clauses. The 
essential point to keep in mind is that the literals making up a program's clauses are ^-terms 
rather than conventional Prolog terms. Hence, as is the case in deductive database languages, 
the Horn clauses are separated into the extensional database (EDB) — i.e., the facts containing 
no variables — and the intensional database (IDB) — the rest. 

Our idea is to represent the (presumably numerous) facts of a LIFE program's EDB as flat 
relations to store in an external relational database. Then, designing an interface amounts to 
defining an intermediate representation allowing to translate from facts of LIFE {i.e., ^-terms) 
to database tuples and back. To be correct, a database retrieval algorithm responding to a LIFE 
query through this interface must be sound {i.e., retrieve no irrelevant tuples) and complete 
{i.e., retrieve all relevant tuples). Hence, the interface design and the correctness of retrieval 
depend in some essential way on the formalization of ^-terms. This section is meant to give 
all the preliminary formalities that we use, introducing basic and disjunctive ■^-terms, type 
signatures, subsumption, and related notions. From this point on, whenever we say "term" we 
shall mean (possibly disjunctive) "^-term."^ 

2.1 Terms 

A basic term is built out of type symbols and attribute labels. Let C be the set of all attribute 
labels, and S the set of all type symbols, including T {"top") and _L {"bottom"). 

Definition 1 (Basic term) Ahasictermp is an expression of the form s[h =^ Pi, ■ ■ ■ Jn ^ Pn)> 
n > 0, where: 

• s e S is the root symbol ofp, denoted by root(p). 

• /i ...,/„ G -C are pairwise distinct attribute labels. 

• p\,... ,pn are terms: the subterms ofp. 

If n = 0, /7 is is said to be atomic, and simply written as s. Otherwise,;? is said to be attributed. 
The attribute-SLibterm list is unordered. A term with at least one occurrence of the symbol _L 

'More precisely, we shall mean ip-Xeam without variables since only EDB facts will be considered. 
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is considered to be equal to the term _L. We call be the set of all basic terms that can be 
constructed from type symbols in S and labels in C. 



Example 2. 1 An example of a basic term is: 



likesywho mary, 

bom ^ date[day ^ 24, 

month January, 
year 1965), 

what apples^ . 



The root symbol is likes; it has three subterms with attribute labels who, bom and what. The 
type symbols are likes, mary, date, 24, January, 1965, and apples. The attribute labels are 
who, born, day, month, year, and what. 



We shall use a more convenient mathematical characterization of a basic term that is formally 
equivalent to their syntactic representation of Definition 1 . It sees a term as a mapping from a 
set of occurrences (i.e., strings of labels in the free monoid £*) to S, assigning type symbols 
to each of these occurrences. 



Definition 2 (Occurrence) An occurrence is a string formed by concatenating labels, sepa- 
rated by '. '. The root label is denoted by the empty string e. The set of all occurrences C* is 
inductively defined as C* := e | C.C*, where a.e = e.a = a for any occurrence a. 

In what follows, every time we refer to term p, we mean the generic one in Definition 1 . 

Definition 3 (Occurrence domain) The set of occurrences actually appearing in a term p is 
the occurrence domain Ap.- the smallest subset of C* for which: 

• e E Ap and 

• li.a G Ap iffli is the label in p denoting the subterm p,, and a G Ap.. 



Definition 4 (Type function) To each term p there corresponds a type function ipp : jC* S 
which assigns a type symbol to each occurrence: 



ipp{a) = 



T if a ^ Ap 

root(/7) if a = e 
■ippi{a') if a = li.a' 
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Hence, a basic term is formally characterized as a pair p = {Ap, tpp) . 

Example 2.2 Referring to the term in Example 2.1, the domain is {e, who, bom, bom.day, 
bom.month, bom.year, what}. The type function is defined as: ipi^e) = likes, i/j [who) = 
mary,tp{born) = date, [bom. day) = 24, etc. Note that the type function returns the 
T-symbol for any occurrence not in the occurrence domain, for example [day. what) = T . 

2.2 A short terminological digression 

For the sake of self-containment and to settle some terminology, we indulge in a brief 
intermezzo defining a few general basic order-theoretic notions that we shall use in the rest of 
this paper. All defirutions in this short digression will refer to a partially-ordered set, oiposet, 

{s,<). 

Recall that a chain of 5 is a totally ordered subset of 5. Let us also recall the notion of cochain, 
a dual of the more familiar notion of chain: 

Definition 5 (Cochain) A cochain C of S is a subset of S where all distinct elements are 
mutually incomparable. Formally, C x C fl < = 1 c-^ 

The set of all cochains of 5 is denote as coc(5). The set coc(5) is itself partially ordered as 
follows. 

Definition 6 (Cochain ordering) VCi,C2 g coc(5), Ci c Ci iff Vxi g Ci, 3x2 e Ci : 

Xi < X2. 

Note that the empty set 0 is a cochain. In particular, the empty set is the least element in 
coc(5);thatis,VC C 5 : 0 C C. 

Note also that singletons of elements of 5 are cochains too. In fact, the cochain ordering C 
coincides with < on singletons; namely, Vx,x' G 5 : {x} C {x'} iff x < x'. For this reason, 
an element x of 5 may be identified with the singleton {x}. Hence, the cochain ordering C 
is a "natural" extension of the base ordering < and so we shall use only one symbol (<) 
indifferently on base elements or cochains of S without risk of confusion. 

It will be convenient to refer, for a given element of S, to specific subsets of its upper bounds 
or lower bounds. The following definitions introduce a few that we will use. In what follows, 
X and x' denote elements of such a set S. 

Definition 7 (Ancestors) The set of ancestors ofx is the set anc(x) of elements greater than, 

or equal to x: 

^Wherelx = £ X} is the identity relation on X. 
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anc(x) = {x' G 5 I X < x'}. 

Definition 8 (Descendants) The set of descendants ofx is the set des(x) of elements smaller 
than, or equal to x: 

des(x) = {x' G 5 I x' < x} 

Given S' C 5, let [5'] (resp., [5'J ) denote the set of all its maximal (resp., minimal) elements.^ 

We define parents and children, as well as maximal common lower bounds and minimal 
common upper bounds, in terms of ancestors and descendants as follows. 

Definition 9 (Parents and cllildren) The parents ofx are its immediate upper bounds; i.e., 
the minimal ancestors, excluding x itself: 

par(x) = [anc(x) \ {x}J 

Dually, the children ofx are its immediate lower bounds; i.e., 

chi(x) = [des(x) \ {x}] 

Definition 10 (Maximal common lower bounds) r/ie^efo/maximal common lower bounds 
ofx and x! is denoted as xUx!, and defined as: 



Definition 11 (Minimal common upper bounds) Dually, o/ minimal common upper 
bounds of s and s' is denoted x U y, and defined as: 



Note that all the sets introduced by the four previous definitions are cochains. 

Finally, given two functions / and/' from from a set A to a poset (5, <), we say that/ < /' 
whenever Va G A : f[a) <f'{a). 

This concludes our terminological digression. We now return to our topical considerations. 

^To be well-defined, this requires that S not contain infinitely ascending (resp., descending) chain. So we shall 
impUcitly assume this. In fact, all the posets on which we will use these operations wiU be finite. 
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2.3 Type signature 

The set of type symbols S comes with a subtype ordering < . The set S and the ordering form 
a type signature, a poset S = {S,<). We may assume the type signature to be fixed. 

Definition 12 (Type signature) A type signature S is a poset {S, <), where: 

• S is the set of type symbols, containing top symbol T and bottom symbol _L. 

• < S X S is a partial order — the subtyping — on S such that \/s E S : ± < s < T. 

Example 2.3 In all examples in this paper, we shall use a type signature consisting of a 
set (S = {T, _L, student, emp, mary, likes, food, apples, sweets, cookies, chocolate} and 
subtyping relation the least ordering such that apples < food, sweets < food, cookies < 
sweets and chocolate < sweets, expressing that apples and sweets are food, and cookies 
and chocolate are sweets; and such that mary < student and mary < emp, expressing that 
mary is both a student and an employee. This type signature will be referred to as S and is 
depicted in Figure 2. 



Figure 2. The type signature i7. 
2.4 Term subsumption 

The partial order < on type symbols extends to the set of all terms as follows: 

Definition 1 3 (Basic term subsumption) The basic term subsumption relation < on the set 



T 




student emp likes food 




_L 



of all basic terms ^ is defined asp ^ p' iffp = J- or tpp < tppi. 
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Example 2.4 The term: 

Pi = likes{who mary, what apples^ 
is subsumed by the term: 

P2 = likes{who mary, what food) 
since apples < food. Term p\ is also subsumed by the term: 

Pz = likes{who mary) 

since the type symbol is T for any occurrence that is not in the occurrence domain; i.e., 
tpp^{what) = apples < tpp^{what) = T. Thus any basic term is subsumed by T and 
subsumes _L. 



Note since 5 is a subset of ^, ^ coincides with < on it. Therefore, ^ can be seen as a "natural" 
extension of the subtype ordering < and therefore we shall again use only one symbol (<) 
indifferently on type symbols or basic terms without risk of confusion. 

As expected, we now extend terms to cochains of terms. 

Definition 1 4 (Disjunctive terms) A disjunctive term is a cochain of basic terms. 

Term subsumption is naturally extended to disjunctive terms as the cochain ordering of basic 
term subsumption. Hence, by "term" we now shall mean basic or possibly disjunctive term. 

As usual, a singleton disjunctive term {p} is identified with the basic term p. In particular, 
the singleton set {T} is identified with the basic term T. This is natural since they are both 
greatest elements for term subsumption. Similarly, {-L} is identified with the basic term _L. 
Again, this is natural since they are both least elements. However, the empty set 0 is also the 
least element of coc(l?^), and hence we can identify all three: _L = {_L} = 0. 

The following is a particular case of a more general result in [1]. 

Tlieorem 1 The poset (coc(l?^), <) is a lattice.^ 

''Recall that a lattice L is a poset where a unique greatest lower bound and a unique least upper bound both exist 
in L for any finite non-empty subset of L. 
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Proof: Greatest lower bounds are constructed as follows. For basic terms p and p', the (possibly 
disjunctive) term p Ap' is the set of maximal elements of the set of all basic terms u= tpu) such 
that: 

• ya e Au : Tpuia) G ippia) n ipp'ia). 

For (possibly singleton) disjunctive terms C, C, it is given by C A C = \{p A p' \ p ^ C,p' G C'J] . 

Dually, least upper bounds (LUB) are constructed as follows. For basic terms p and p', the (possibly 
disjunctive) term pv p' is the set of minimal elements of the set of all basic terms u = tpu) such 
that: 

• A„ = Apn Ap,, 

• ya e Au: Tpuia) G ippia) U ipp'ia). 

For (possibly singleton) disjunctive terms C, C, it is given by C V C' = \_{p V p' \ p ^ C,p' & C'}\ . 
It is easy to verify that these operations are lattice operations with respect to term subsumption. | 

Note that if the type signature i7 is a lattice, then so is l?^, and moreover, it is then a sublattice 
of coc(^^). 

Example 2.5 The GLB of terms pi and p2 in Example 2.4 is pi, since pi < pi. The 
GLB of likes{who student) and likes{who emp) is likes{who mary). Their LUB 
is likes{who ^ t). The GLB of atomic terms /ooJ and student is _L; i.e., we cannot unify 
these. 

3 Representation in a database 

We now discuss the storage of facts in an external relational database. 
3.1 Qualified segments 

In a relational database, identically formed objects are grouped together in a relation. We must 
define a similar grouping on facts that we store in the external database. We must also find a 
way to represent subtype information relevant to type symbols in these facts in the database 
as well as there is no evident way to express subsumption in relational algebra. Therefore, if 
a fact is stored in a database relation, it should imply that particular subtype relationships are 
defined for symbols in this fact. Thus we should group facts with similar subtype relationships 
for its symbols, for example symbols with the same parents or children or both. However, 
there is a trade-off: the more subtype information is implicitly represented, the more database 
relations are needed to store all facts. 

We choose to group facts with the same set of parents for all symbols at each given occurrence. 
It turns out that this is a natural choice since sharing parents is the most immediate commonality, 
akin to values being of the same type. These sets are called qualified segments: 
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Definition 15 (Qualified segment) A qualified segment Q is a set of non-bottom facts such 
that all facts have the same set of parents for the type symbol at each occurrence: 

yfj' eQ,yaeAf: par(V'/(a)) = par(V'/'(a)) 

With some easy thinking, one can convince oneself that all facts in Q must necessarily be 
identically formed. Indeed, the occurrence domain is the same for all facts in a qualified 

segment, since parents are the same for symbols at each occurrence. For a qualified segment 
Q, the common occurrence domain of all facts is denoted Aq. 

For a program P, we can use multiple qualified segments to store part of the facts in P in the 
database. We store each qualified segment in a separate database relation, and in the interface 
we store a description of the contents of each segment, called the qualifier. A qualifier is a set 
of terms, that are generalizations of all facts in the qualified segment: 

Definition 1 6 (Qualifier) To a qualified segment Q corresponds a qualifier, denoted qua((2), 
which is the LUB of all facts in Q. 

Example 3.1 Let us assume the two facts of LIFE likes{who ^ mary,what ^ sweets) 
and likes{who ^ mary, what ^ apples) . Since both facts have the same parents for all type 
symbols, we can represent them in a qualified segment Q = {likes{who ^ mary, what ^ 
sweets) , likes[who ^ mary,what ^ apples)}. The qualifier is qua((2) = likes[who ^ 
mary, what food) . 

An important remark is that the qualifier of a qualified segment is alway a strict generalizer of 
all facts of the segment. This is a consequence of having grouped facts in the same qualified 
segment if and only if the type symbols at all their occurrences shared the same parents.^ 
And thus, as we will see in Section 5, a qualifier and the terms in the corresponding segment, 
implicitly represent subtype relationships. 

3.2 Database relations 

A relational database consists of database relations: 

Definition 1 7 (Database relation) A database relation is a set {ri, ri, . . . , r„}, (m > O) 
ofn-ary tuples {n > \) and is identified by its relation name R and a set o/ attribute names 
T = , ?2, . . . , tn}. For a particular tuple r, the value of attribute t is denoted as r.t. 

We store a qualified segment Q in database relation Rj by representing each fact in g as a tuple 
in Rj. We represent fact/ as a tuple r by fiattening the fact; i.e., we define a bijective function 

'More precisely, this is true if the quahfied segment is not reduced to only one fact. But then, as we shall see, 
there is no relation to store in the database. 
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V — called attribute function — that maps occurrences in the occurrence domain Af to attribute 
names in T. Then, for each occurrence a G Af, we store type symbol i^f{a) in attribute v(a) 
in tuple r. 

This representation is sound, but it can be compressed by recognizing that for particular 
occurrences in the occurrence domain, symbols are the same in all facts in the segment. For 
example, the symbol at the who occurrence in Example 3.1 is mary for all facts in Q. This 
(possibly empty) set of occurrences is the fixed symbol set: 



Definition 18 (Fixed symbol set) For qualified segment Q we define the fixed symbol set 
Dq C Aq as: 

DQ = {aeAQ\ V/,/' G Q : ^f{a) = 



Symbols at occurrences in the fixed occurrence set Dq are the same for all facts in qualified 
segment Q, hence, we do not have to store them in the database. We only store symbols at 
occurrences not in Dq and use any basic term in the qualifier to represent the missing symbols. 
Indeed, for each basic term q in the qualifier, the type symbol ^'^(a) for each occurrence a in 
the fixed symbol set Dq is their LUB and thus the same as the symbol at this occurrence for 
all facts in Q. 

The correspondence between qualified segment Q and database relation Rt is defined by a 
data definition: 

Definition 1 9 (Data definition) Given segment Q, the corresponding database relation Rt is 
defined by a data definition given by the quadruple (qua((2) , R, v, Dq). 

Data definitions are stored in the interface, thus enabling the representation of facts in segment 
Q as tuples in Rt. With each fact/ = {Af, ipf) G Q corresponds a unique tuple r G Rt, defined 
by: 

yteT: r.t = ipf{v-\t)) 

Conversely, each database tuple r G /?r represents a fact/ = (ZlgjV'/), where the type function 
Ipf is defined as: 



il}f{a) = 
where q G qua((2) 



T if a ^ Aq 

ipq{a) if a e Dq 
r.v[a) otherwise 
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Example 3.2 The qualifier for qualified segment Q from Example 2.3 is {likes{who ^ 
mary, what ^ food)}, and the fixed symbol set is Dq = {e,who}. If we represent (2 as a 
database relation Rj-, we only need to store the symbols at occurrence what, so we need a 
relation with a single column, say T = {foodname}. 

We define the attribute function v as: v[what) = foodname. The representation of (2 as a 
database relation is Rt = {{sweets), (apples)}. 

Note, for the sake of consistency, that in the already mentioned degenerate case of a qualified 
segment reduced to only one fact, all the information goes into the fixed address set and the 
qualifier, leaving nothing to be stored in the external database. 

4 Retrieval algorithm 

For the retrieval of facts from the database, we use a tight coupling, where we load facts from 
the database whenever needed by the inference engine. For a particular goal g, we load the 
subset Q[g] from segment Q, containing all facts in Q that unify with g: 

Q[8] = {feQ\fAgt^} 

Qualified segment Q is stored in the database, so we do not know its actual contents, hence 
we caimot compute Q[g] by simply unifying all facts in Q with the goal. So, we need 
another technique to compute Q[g], independent of the contents of Q. We use an abstract 
interpretation [11] of the inference process, where we use qualifiers instead of facts. In this 
abstraction, unification of facts in Q with goal g is an operation on the qualifier and the goal, 
resulting in a term — called the candidate — which approximates the subset of Q of all facts 
unifiable with g. We describe the construction of candidates. First, we define the unifiable set 
U[s), the set of all type symbols that unify with symbols; i.e., symbols for which the maximal 
common subtype with s is non-bottom: 

Definition 20 (Unifiable set) For a type symbol s in S, we define the unifiable set U[s) as: 

u{s) = {s' es\sns' ^ {±}} 

A candidate is defined such that any fact in the qualified segment subsumed by a basic term in 
the candidate, unifies with goal g: 

Definition 21 (Candidate) Given a goal g, a basic term, the candidate C is the set of all 
maximal terms c = {Aq, tpc) that can be constructed from a term q in the qualifier qua((2) 
that is unifiable with g, as follows. \/a E Aq : 
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= T if a G Dg, or ■^^(fl) < ■^^(a), 
G chi('^^(a)) n U{iljg[a)) otherwise. 



Example 4.1 Assume the goal gi = likes{what cookies) and qualified segment Q as 
in Example 3.2. By Definition 21, we construct a candidate C\ = T{yvho ^ T ,what ^ 
sweets). For goal = likes{yvho ^ student, what ^ food), we construct candidate 
C2 = t(>v^o ^ T, wto ^ t). For goal ^3 = likes{who ^ peter, what ^ apples), we 
construct candidate C3 = 0. 



Thus a candidate contains terms, identically formed to the facts in the segment, and consisting 
of T -symbols and immediate subtypes of symbols in the qualifier; i.e., symbols that appear in 
facts in Q. If candidate C is empty, the symbols in the terms in the qualifier and the goal do 
not unify, then the qualified segment does not contain any facts that unify with the goal. We 
have to prove that any fact / in qualified segment Q that unifies with goal g, is subsumed by a 
basic term c in candidate C. 

Theorem 2 A fact f in qualified segment Q unifies with goal g iff it is subsumed by a basic 
term c in candidate C; namely, 

fAgt^ ^ f<c 

Proof: By Definition 13 and Theorem 1, we can rewrite the above to a condition on type symbols, 

ipf{a) n Vg(a) {-L} O ipf{a) < ipc{a) 

We first prove that if the maximal common subtype of two symbols and '^g{a) is non-bottom, 
then we can construct a term c such that '^f{a) is smaller than the corresponding symbol V'c(<J') in c. 

Symbols V'/(fl) and tl)g{a) unify, so tl)f{a) is in the unifiable set U[ipg[a)). Symbol 'ipq{a) is larger 
than ipf{a), and thus unifies with ipg{a) as well: ipg{a) G U{ipg{a)). So, by definition, V'e(«) is not 
the symbol _L. Assume that occurrence a is in the fixed occurrence set Dq. By definition, V'c(«) = T 
and thus symbol ip/ia) is smaller than the symbol V'c(fl) in c. Alternatively, if occurrence a is not in 
the fixed symbol set Dq, symbol ipf{a) in fact / is a child of ipq{a). We also know that ipf{a) is in 
U{ipg{a)), thus we can construct a term c where ipdci) = '^f{<^)- So we can construct a term c larger 
than any fact/ that unifies with goal g. 

We also prove that if fact/ in Q does not unify with goal g, we cannot construct a term c larger than 
/. Fact / and term g do not unify, so for at least one occurrence a, the maximal common subtype of 
V'/(a) and V'g(o) is the bottom symbol. We prove that, for this occurrence, we cannot construct a 
candidate c with '^f{a) < ipda). 

The symbol ipq{a) is a supertype of ipf{a). If q and g do not unify, the candidate is empty. Thus, it 
does not subsume any fact. If q and g unify then ipq{a) is in U{ipg{a)), for all occurrence a in Aq. 
Symbol ipg{a) cannot be a supertype of ipq{a), otherwise, ipg{a) would be a supertype of ipf{a) as 
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well, and their maximal common subtype would be '^f{a). Moreover, occurrence a cannot be in the 
fixed symbol set Dq, otherwise tl)f{a) = ipqia), contradicting that ipqia) is not in the unifiable set 
U{'ipg{a)). Hence, the symbol V'c(fl) in c is not T. 

If we can construct a term c larger than /, symbol ipc{a) would be a child of ipq{a) and a member of 
the unifiable set U{ipg{a)). Since occurrence a is not in the fixed occurrence set, ipf{a) is also a child 
of 'ipcj{a). So the only child of 'ipq{a), larger than tl)f{a), is tl)f{a) itself. However, tl)f{a) is not in the 
unifiable set U{ipg{a)), so we cannot construct a term c, where V'e(«) G chi{ipq{a)) n U{ipg{a)), that 
is larger than fact/. | 

Corollary 1 If fact f is subsumed by a basic term c in candidate C, all symbols in c are either 
the top symbol, or equal to the corresponding symbol in fact f. 

Proof: Follows directly from the above proof, since V'e(fl) is either T, or a child of the symbol il)q{a) 
in the qualifier. For these symbols, occurrence a is not in the fixed occurrence set, thus symbol tl)f{a) 
in term/ is also a child of V'g (a) • I 

The corollary is important, since it states that we can compute Q\g\ by a selection with the 
candidates, where T is the wild card argument and non-top symbols are selection arguments. 
With a candidate C for data definition D = {F,Rt,v,Dq), there corresponds a selection 
condition T[C] that is true for all elements of the set Q[g] and false for any other element of Q: 

T[C] = (r[ci]) or ... or {T[cp]) 

where C = {ci,...,Cp}. For each term q we construct a selection condition: 

T[ci]= ( v(ai) = V'c(«i) ) 

and . . . 

and {v{a„) = tpc{an)) 

where ai , . . . , a„ are the occurrences with non-top symbols in term c/. We select the tuples 
that represent facts in Q[g] with a simple SQL-query: 

select ti,... ,t„ 
from R 
where T[Q 

The retrieved tuples are then translated to facts, as stated in Section 3.2. 



Example 4.2 For the candidate C\ of Example 4.1, we construct a selection condition 
T[Ci] = [v[what) = V'ci ) = [foodname = sweets). The query is: 
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select foodname 
from R 

where foodname = 'sweets' 

and returns the tuple {sweets), which is transformed to the fact likes{who ^ mary, what ^ 
sweets) . 

5 Reduced type signature 

For the construction of candidates, we use type signature S. Part of the subtype relationships 
are imphcitly represented in the database, that is, for each fact in a qualified segment, the 
parents of all symbols at occurrences not in the fixed symbol set Dq are stored in the qualifier. 
We do not store these 'implicit' subtype relationships in the LIFE system, but add them when 
facts are loaded. 

The remaining subtype relationships have to be stored in the LIFE system, since we have to 
be able to reconstruct the entire type signature. However, part of the subtype relationships 
implicitly stored in the database are needed to construct candidates. Thus we should either 
retrieve these relationships at run-time from the database, or simply duplicate the necessary 
relationships in the LIFE system, or use a combination of both techniques. 

We will adopt the second strategy, which is simple, and probably non-optimal: we store 
sufficient subtype relationships in the LIFE system to compute candidates for any goal and 
qualifier in program P. We construct a reduced type signature S' = {S', <'), where S' C S 
and <' C <. 

Definition 22 (Reduced type signature) The reduced type signature S' = {S', <') is such 
that S' is the subset ofS, where we may exclude least sorts (parents of bottom) with a single 
parent, stored in a database relation, and not in a term in a qualifier. The reduced subtype 
relation <' is the subset of <, induced by the set S' : 

<' =< ns' X s'. 

Example 5.1 The reduced type signature S' is depicted in Figure 3. The least sorts 
with a single parent are the symbols likes, mary, apples, cookies and chocolate. The 
symbols in the database are apples and sweets. The symbols not in a qualifier are 
student, emp, apples, sweets, cookies and chocolate. Hence, the only symbol that is a least 
sort, in a database relation and not in a qualifier is apples. 

We have to prove that the reduced type signature is complete; that is, all subtype relationships 
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Figure 3. Reduced type signature S' . 



are represented either in the database or in the reduced type signature. Moreover, we have to 
prove that we construct the same candidates with the reduced type signature. 

Theorem 3 All subtype relationships are either represented in the LIFE system or implicitly 
in the database. 

Proof: Assume a subtype relation * < where s is not in S' . By definition, i is a symbol in a 
database relation, and not a symbol in a qualifier. So there is a symbol s" e S' at the corresponding 
occurrence in the qualifier for this database relation, so s < s" is a relation implied by this segment. 
Since s and s" are in 5', i" <' s' . So we can reconstruct s < s', since s < s" and s" <' s' . 

Now assume the relation s < s' where s' is not in 5'. Since only least sorts are not stored in 5', s 
must be the bottom symbol, and _L < is implicitly defined by the type signature for any s' G S. | 

Theorem 4 If we exchange S for S', we construct the same candidates for a goal g and a 
qualifier qua[Q). 

Proof: To construct candidates, we compute the unifiable set U{s) for any symbol s in the goal. We 
define U'{s) as the set containing all symbols in S' that unify with i e 5', as defined by the subtype 
relation <'. For the correct construction of candidates, U'{s) should contain all symbols in U{s) that 
are also in S', that is: 

ys,s' es' :s' e u'{s) o / e u{s) 

Symbol s' is in U{s) if the maximal common subtype of s and s' is non-bottom. We prove that for 
any s, s' in S', maximal conmion subtypes * n / form a subset of S', and thus that s' is in U'{s) if s' 
is in U{s). The set 5 n is either {s} or {s'}, or a set of symbols, smaller than both s and s'. These 
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symbols are all in S', since we excluded only symbols with a single parent, thus symbols that can 
never be a maximal connmon subtype of two other symbols. 

Moreover, if s n s' = {_L} (i.e., s' 0 U{s)), than s' is not in the unifiable set U'{s) as well, since the 
subtype relation <' in the reduced type signature form a subset of the subtype relation <. | 

As can be seen in Example 5.1 and Figure 3, simply duplicating all necessary subtyping 
information works fine for qualified segments containing a large number of facts with least 
sort symbols (i.e., data typically found in databases), since these symbols are not stored in 
the reduced type signature. However, we stress that the above solution is non-optimal, since 
the reduced type signature S' contains more subtype information than actually needed. We 
believe it is possible to further 'strip-down' the reduced type signature. We think of a technique 
called segment guessing, where less subtype information is needed, and the retrieval algorithm 
queries any database relation that might contain unifiable facts, based on available subtype 
information. 

6 Optimization 

To reduce database interaction, we assert loaded facts in the internal LIFE database, instead 

of retrieving the same facts over and over again. However, if we assert facts in the internal 
database, we should retrieve each fact only once. Thus when querying the database for all 
unifiable facts for goal g; in segment Q, we should exclude all facts loaded from Q for previous 
goals gi,...,gi_i. 

As we stated in Section 4, we can describe each subset Q[gi] with a selection condition 
Thus we can exclude any subset with the negation of its selection condition. We select the 
tuples from the database with an SQL-query: 

select ti,...,t„ 
from R 

where r[C/] and not (r[Ci]) 
and . . . 

and not(r[Q_i]) 

The set of all candidates for previous goals forms an abstract cache, storing the results of 

previous abstract computations; i.e., all constructed candidates. This is also known as the 
caching of queries, as described by Ceri et al. in [10]. However, storing all these candidates is 
expensive, and therefore we will shortly mention a few optimizations. 

Instead of storing all previous candidates, we use a single set — called look-up set to represent 
that part of the qualified segment that has been loaded: 

Definition 23 (Look-up set) For a segment Q, we define the look-up set L[i\ as the set, 
formed of the maximal terms in the union of candidates ci , . . . , q. 
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A look-up set is an equivalent, but more compact notation for a set of candidates, since any 
term subsumed by another term, is removed. The SQL-query reduces to: 

select ti, .. . ,tn 
from R 

where r[Q] and not (L[/ - 1]) 

Another optimization consists of posing only queries that might retrieve any tuples, that is, we 
exclude queries with a contradicting selection condition. This occurs when the current query is 

subsumed by a previous query, as described in [10]. The subsumption of queries is defined by 
the subtype relation < on candidates. That is, all facts for goal gi have been loaded if any term c 
in candidate Q is subsumed by some term c' in the look-up set: Vc G C,-,3c' G L[i—1] : c < c' . 

A third optimization is the partial exclusion of previous queries. If we retrieve a set from the 
database, we only need to exclude previously retrieved sets that overlap with the current set; 
i-e., Q\gi] n Q\gj] t 0- 

We further like to mention that, since candidates are wild card selections, testing subsumption 
and overlapping reduces to simple comparison operations on the respective type symbols. 

7 Conclusion 

We have overviewed a formal design for interfacing a logical query language with complex 
objects to a relational database. Our system is an improvement on previous systems in that it 
provides database storage for objects ordered thanks to a subtype hierarchy, representing part 
of this hierarchy in the database as well. The representation of the objects is flexible; arbitrarily 
nested objects can be represented in a maximally compressed format, where compressing and 
decompressing is handled by the interface. The loading algorithm is quite efficient in that it 
loads only objects actually needed by the LIFE system, and never loads the same object twice, 
thus improving results in [10]. In addition, our design also improves on previous work by 
providing for free the ability, intrinsic to ■^-terms, to store and query partial information. For 
example, if all facts in LIFE'S EDB stipulate that all students are happy, a query requesting to 
Ust happy things will avoid itemizing in extenso all 12,452 tuples of students, giving only the 
one tuple corresponding to the intensional LIFE fact happy[student) . 

LIFE is an extension of logic programming: first-order logic programs are LIFE programs 
with a flat type signature; i.e. , all type symbols — except for T and _L are incomparable. Hence, 
the retrieval algorithm holds for languages using Prolog terms as objects as well. 

Part of the system described in this paper has been implemented: the LIFE-WISDOM system 
(LIFE With Inheritance Supported Data Object Management) implements a database interface 
for an implementation of LIFE called wild_LIFE [3], to an ORACLE relational database [12]. 
The current system implements both database retrieval and updates, but only for single 
inheritance and facts consisting of least sorts. 
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As for the future, we want to extend this approach to goals with variables. For example, 
a goal such as name[X,X) must only unify with facts with identical arguments and should 
generate database queries retrieving only tuples with identical values in columns. Then, we 
may translate entire LIFE rules to complex join operations on the database. The translation of 
recursive LIFE rules to extended relational algebra expressions must also be explored. Another 
direction of research consists of weakening the restrictions for the reduced type signature, by 
redefining qualified segments and using other search strategies, such as segment guessing. 
Also, we may consider iterating our construction, building multiple levels of abstractions; i.e., 
the storage of qualifiers themselves in higher-level qualified segments. 
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